8-Bit Approximations for Parallelism in Deep Learning

نویسنده

Tim Dettmers

چکیده

The creation of practical deep learning data-products often requires the parallelization across processors and computers to make deep learning feasible on large data sets, but bottlenecks in communication bandwidth make it difficult to attain good speedups through parallelism. Here we develop and test 8-bit approximation algorithms make better use of the available bandwidth by compressing 32-bit gradients and nonlinear activations to 8-bit approximations. We show that these approximations do not decrease predictive performance on MNIST, CIFAR10, and ImageNet for both model and data parallelism and provide a data transfer speedup of 2x relative to 32-bit parallelism. We build a predictive model for speedups based on our experimental data, verify its validity on known speedup data, and show that we can obtain a speedup of 50x and more on a system of 96 GPUs compared to a speedup of 23x for 32-bit. We compare our data types with other methods and show that 8-bit approximations achieve state-of-the-art speedups for model parallelism. Thus 8-bit approximation is an efficient method to parallelize convolutional networks on very large systems of GPUs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs

We show empirically that in SGD training of deep neural networks, one can, at no or nearly no loss of accuracy, quantize the gradients aggressively—to but one bit per value—if the quantization error is carried forward across minibatches (error feedback). This size reduction makes it feasible to parallelize SGD through data-parallelism with fast processors like recent GPUs. We implement data-par...

متن کامل

Deep Learning and Its Parallelization: Concepts and Instances

......................................................................................................................... 3 1 Introduction ............................................................................................................. 3 1.1 Application Background ............................................................................... 4 1.2 Performance Demands for Deep L...

متن کامل

Porosity classification from thin sections using image analysis and neural networks including shallow and deep learning in Jahrum formation

The porosity within a reservoir rock is a basic parameter for the reservoir characterization. The present paper introduces two intelligent models for identification of the porosity types using image analysis. For this aim, firstly, thirteen geometrical parameters of pores of each image were extracted using the image analysis techniques. The extracted features and their corresponding pore types ...

متن کامل

The Application of Least Square Support Vector Machine as a Mathematical Algorithm for Diagnosing Drilling Effectivity in Shaly Formations

The problem of slow drilling in deep shale formations occurs worldwide causing significant expenses to the oil industry. Bit balling which is widely considered as the main cause of poor bit performance in shales, especially deep shales, is being drilled with water-based mud. Therefore, efforts have been made to develop a model to diagnose drilling effectivity. Hence, we arrived at graphical cor...

متن کامل

BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

We introduce BinaryNet, a method which trains DNNs with binary weights and activations when computing parameters’ gradient. We show that it is possible to train a Multi Layer Perceptron (MLP) on MNIST and ConvNets on CIFAR-10 and SVHN with BinaryNet and achieve nearly state-of-the-art results. At run-time, BinaryNet drastically reduces memory usage and replaces most multiplications by 1-bit exc...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1511.04561 شماره

صفحات -

تاریخ انتشار 2015

8-Bit Approximations for Parallelism in Deep Learning

نویسنده

چکیده

منابع مشابه

1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs

Deep Learning and Its Parallelization: Concepts and Instances

Porosity classification from thin sections using image analysis and neural networks including shallow and deep learning in Jahrum formation

The Application of Least Square Support Vector Machine as a Mathematical Algorithm for Diagnosing Drilling Effectivity in Shaly Formations

BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

عنوان ژورنال:

اشتراک گذاری